Skip to content

[ROCm] Remove BF16 workarounds for PaddleOCR-VL on HIP#5096

Open
austin1997 wants to merge 1 commit intoPaddlePaddle:developfrom
austin1997:bf16-rocm-cleanup
Open

[ROCm] Remove BF16 workarounds for PaddleOCR-VL on HIP#5096
austin1997 wants to merge 1 commit intoPaddlePaddle:developfrom
austin1997:bf16-rocm-cleanup

Conversation

@austin1997
Copy link
Copy Markdown

Summary

PaddlePaddle/Paddle#78711 在框架层适配了 HIP BF16 剩余的算子缺口(layer_norm / softmax 注册 BF16 内核,conv2d_add[_act]_fuse_pass 在 HIP wheel 上不再注册)。配合已合入的 PaddlePaddle/Paddle#78587(BF16 conv 内核),PaddleOCR-VL 端到端 BF16 推理在 AMD GPU 上已经可以跑通。本 PR 同步移除 PaddleX 中针对 ROCm 的两类 BF16 workaround。

修复 #5095依赖 PaddlePaddle/Paddle#78711PaddlePaddle/Paddle#78587 都合入并发版后才能合并——否则旧 wheel 上 BF16 视觉子图仍会崩溃。

Changes

  • paddlex/inference/models/doc_vlm/modeling/paddleocr_vl/_paddleocr_vl.py:删除 _keep_in_fp32_modules = ["visual", "mlp_AR"],让 SigLIP 视觉塔 + mlp_AR projector 跟随模型 dtype,无需强制 FP32。
  • paddlex/inference/models/runners/paddle_static/runner.py:删除 4 处 if paddle.is_compiled_with_rocm(): config.delete_pass("conv2d_add_act_fuse_pass"); config.delete_pass("conv2d_add_fuse_pass") 块(行 406-408、462-464、496-498、505-507)。

CUDA 推理行为完全不变。

与现有 PR 的关系

Test plan

The PaddleOCR-VL pipeline previously needed two ROCm-specific escape hatches:

* `_keep_in_fp32_modules = ["visual", "mlp_AR"]` on
  PaddleOCRVLForConditionalGeneration kept the SigLIP vision tower and the
  multimodal projector in FP32 because BF16 layer_norm and BF16 softmax were
  not registered for HIP, so running the vision encoder in BF16 crashed.
* Four `paddle.is_compiled_with_rocm()` blocks in
  `paddlex/inference/models/runners/paddle_static/runner.py` (lines 406-408,
  462-464, 496-498, 505-507) called
  `delete_pass("conv2d_add_act_fuse_pass")` and
  `delete_pass("conv2d_add_fuse_pass")` because both PIR passes rewrite
  conv2d+add[+act] into the `fused_conv2d_add_act` op, which only has a
  cuDNN GPUDNN kernel — kernel dispatch then failed on ROCm.

These are addressed at the framework level by the upstream Paddle BF16 fix
(layer_norm + softmax registration on HIP, plus gating both PIR passes on
PADDLE_WITH_CUDA so they no longer run on HIP builds). With that wheel
installed, both PaddleX workarounds become unnecessary:

* Drop `_keep_in_fp32_modules` so the vision encoder + multimodal projector
  run natively in BF16 on ROCm. End-to-end output matches the FP32-fallback
  path on PaddleOCR-VL-1.5 (validated on MI300X / gfx942 / ROCm 7.2). This
  overlaps with PaddlePaddle#5077; if PaddlePaddle#5077 lands first, the conflict is trivial.
* Drop all four `delete_pass` blocks under `paddle.is_compiled_with_rocm()`.
  Once the framework PR lands, the two passes are no longer registered on
  HIP wheels, so `delete_pass` becomes a no-op there.

Requires the framework BF16 PR to be merged and released; with older Paddle
wheels the BF16 visual path will still crash on ROCm. CUDA behavior is
unchanged — both passes remain registered under PADDLE_WITH_CUDA, and the
vision encoder simply uses whatever dtype the model is loaded with.
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 18, 2026

Thanks for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants